منابع مشابه
Playing the Right Atari
We experimented a simple yet powerful optimization for Monte-Carlo Go tree search. It consists in dealing appropriately with strings that have two liberties. The heuristic is contained in one page of code and the Go program that uses it improves from 50 % of won games against Gnugo 3.6 to 76 % of won games.
متن کاملPlaying Atari with Deep Reinforcement Learning
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learnin...
متن کاملBack to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep RL problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as ...
متن کاملEmergent Tangled Graph Representations for Atari Game Playing Agents
Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of...
متن کاملPlaying Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay
This paper introduces a novel method for learning how to play the most difficult Atari 2600 games from the Arcade Learning Environment using deep reinforcement learning. The proposed method, called human checkpoint replay, consists in using checkpoints sampled from human gameplay as starting points for the learning process. This is meant to compensate for the difficulties of current exploration...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Autonomous Agents and Multi-Agent Systems
سال: 2021
ISSN: 1387-2532,1573-7454
DOI: 10.1007/s10458-021-09497-8